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1 Introduction 

In extreme- value statistics, one of the main problems is the estimation of the 
tail index associated to a random variable Y . This parameter, denoted by 7, 
drives the distribution tail heaviness of Y . For instance, when 7 is positive, 
the survival function of Y decreases to zero geometrically, and the larger 
7 is, the slower is the convergence. We refer to [17J for a comprehensive 
treatment of extreme-value methodology in various frameworks and to [12J 
for an overview of the numerous works dedicated to the estimation of the 
tail index. Here, we focus on the situation where some covariate information 
x is recorded simultaneously with the quantity of interest Y . In the general 
case, the tail heaviness of Y given x depends on x, and thus the tail index 
is a function 7(2;) of the covariate. Such situations occur for instance in 
climatology where one may be interested in how climate change over years 
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might affect extreme temperatures. Here, the covariate is univariate (the 
time) . Bivariate examples include the study of extremes rainfall as a function 
of the geographical location. 

Only a few papers address the estimation of conditional tail index. A 
parametric approach is considered in [29] where a linear trend is fitted to the 
mean of an extreme-value distribution. We refer to [13] for other examples 
of parametric models. More recently, Hall and Tajvidi [23] proposed to mix 
a non-parametric estimation of the trend with a parametric assumption on 
Y given x. We also refer to [S] where a kind of semi-parametric estimator 
is introduced for ^{ip^'x)) where ip is a known link function and /3 is inter- 
preted as a vector of regression coefficients. Fully non-parametric estimators 
are introduced in [13] , where a local polynomial fitting of the extreme- value 
distribution to the extreme observations is used. In a similar spirit, spline 
estimates are fitted in |10J through a penalized maximum likelihood method. 
In both cases, the authors focus on univariate covariates and on the finite 
sample properties of the estimators. These results are extended in [6] where 
local polynomials estimates are proposed for multivariate covariates and 
where their asymptotic properties are established for very regular functions 
j(x) (at least twice continuously differentiable) . 

Similarly to these authors, we investigate how to combine nonparametric 
smoothing techniques with extreme-value methods in order to obtain effi- 
cient estimators of 7(x). The proposed estimator is based on a selection, 
thanks to a moving window approach, of the observations to be used in the 
estimator of the extreme-value index. This estimator is a weighted sum of 
the rescaled log-spacings between the selected largest observations. This 
approach has several advantages. From the theoretical point of view, very 
few assumptions are made on the regularity of "f(x) and on the nature of 
the covariate. A central limit theorem is established for the proposed esti- 
mator, without assuming that x is finite dimensional. As an example, we 
provide the asymptotic rate of convergence for Lipschitzian functions j(x) 
and multidimensional covariates x. From the practical point of view, the 
estimator is easy to compute since it is closed-form and thus does not require 
optimization procedures. 

Our family of nonparametric estimators is defined in Section [2j In Sec- 
tion [3l asymptotic normality properties are established, and links with non- 
parametric regression and standard extreme- value theory (without covariate 
information) are highlighted. The choice of weights is discussed in Sec- 
tion HI We first present two classical choices of weights extending Hill [26] 
and Zipf [27, 28J estimators to the conditional case. Next, we address the 
problem of obtaining minimum variance and/or unbiased estimators, basing 
on the knowledge of a second order parameter. The practical difficulties 
arising when this parameter is unknown are also discussed. An illustration 
on real data is provided in Section [5j Proofs are postponed to Section [6l 
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2 Estimators of the conditional tail index 



Let E be a metric space associated to a metric d. We assume that the 
conditional distribution function of Y given x £ E is 

F(y,x) = l-y- 1 ^L(y,x), (1) 

where is an unknown positive function of the covariate x and, for x 
fixed, L(.,x) is a slowly varying function, i.e. for A > 0, 

lim ~n — r = L 

Given a sample (Yi,xi), . . . , (Y„, z n ) of independent observations from (JTJ), 
our aim is to build a point- wise estimator of the function More precisely, 
for a given t £ E, we want to estimate 7(i), focusing on the case where the 
design points xi,...,x n are non random. To this end, for all r > 0, let us 
denote by B(t,r) the ball centered at point t and with radius r defined by 

B(t,r) = {x £ E, d(x,t) < r} 

and let h n> t be a positive sequence tending to zero as n goes to infinity. The 
proposed estimate uses a moving window approach since it is based on the 
response variables Y(s for which the associated covariates x^s belong to the 
ball B(t, h nt t). The proportion of such design points is thus defined by 

1 - 

p(h n ,t) = - G B(t,h n>t )} 

i=l 

and plays an important role in this study. It describes how the design points 
concentrate in the neighborhood of t when h n t goes to zero, similarly to the 
small ball probability does, see for instance the monograph on functional 
data analysis |19| . Thus, the nonrandom number of observations in [8, oo) x 
B(t,h n j) is given by m„ )t = nip(h n j). Let {Zi(t), i = l,...,m n>t } be the 
response variables Y-s for which the associated covariates x[s belong to the 
ball B(t,h n j) and let Z 1<mnt (t) < ... < Z mnumnt (t) be the corresponding 
order statistics. Our family of estimators of j(t) is defined by 

7n(i, W) = J2 i log ( Z ^' t - i+1 ' mn ^ ) W (i/k n , t , t) tjrW (i/k n , t ,t) , 

, (2) 

where k n ^ is a sequence of integers such that 1 < k n j < m n j and W(.,t) 
a function defined on (0, 1) such that L W(s,t)ds / 0. Thus, without loss 
of generality, we can assume that W(s, t)ds = 1. Note that this family 
of estimators is an extension of estimators proposed in [3] in the situation 
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where there is no covariate information. In this latter case, we also refer 
to [11] for the definition of kernel estimates based on non-increasing and non- 
negative functions, and to [21] for a similar work dedicated to Weibull tail- 
distributions. In [31], Viharos discusses the choice of the weight function to 
obtain universal asymptotic normality of the corresponding weighted least- 
squares estimator. 

We also introduce the following extended family of estimators: 

7n(i,M W ) = I>g ( Z ;^ i+1 > m ^ ) < n (t) /£< n (i) , (3) 

where the weights (J% n (t) are defined by (J% n (t) = W(i/k n>t ,t){l + o(l)) 
uniformly in i = 1 , . . . , k Ut t ■ 



3 Main results 

We first give all the conditions required to obtain the asymptotic normality 
of our estimators. In the sequel, we fix t S E such that ^(t) > 0. 



Assumptions on the conditional distribution. Let x S E be fixed. 
Then, model ([1]) is well known to be equivalent to the so-called first order 
condition 



U(y, x) = f mf{s; F(s, x) > 1 - 1/y} = y^ x) t{y, x), 



(4) 



where, for x fixed, £(.,x) is a slowly varying function. The function U(.,x) 
is said to be regularly varying with index 7(x). We refer to [7] for a detailed 
account on this topic. The conditions are: 

(A.l) The conditional cumulative distribution F(.,t) is continuous. 

(A. 2) There exists positive constants cu, zjj and ajj < 1 such that for all 
x G B(t,l), 

log U (z, x) 



sup 

z>zu 



\ogU{z,t) 



1 



< cud au {x,t). 



(A. 3) There exists a negative function p(t) and a rate function &(.,£) satis- 
fying b(y, t) — > as y — > oo, such that for all A > 1, 



log 



r^)= K!/ , t )-i y (A*)-i)(i +( ,(i)), 



where "o" is uniform in A > 1 as y — > oo 
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Conditions (A.l) and (A. 2) are regularity conditions on the conditional dis- 
tribution function. The second-order condition (A. 3) on the slowly varying 
function is the cornerstone to establish the asymptotic normality of tail in- 
dex estimators. It is used in [25J to prove the asymptotic normality of the 
Hill estimate and in [3] for one of its refinements. The second order parame- 
ter p(t) < tunes the rate of convergence of £(Xt, x)/£(t, x) to 1. The closer 
p(t) is to 0, the slower is the convergence. The function b(.,t) is usually 
called the bias function, since it drives the asymptotic behavior of most tail 
index estimators. It can be shown that necessarily, b(.,t) is regularly varying 
with index p(t) (see |22j). 



Assumptions on the weights. The next assumption was first introduced 
in [3] to establish exponential approximations for the log-spacings between 
extreme order statistics. 

(B.l) The function s — > sW(s,t) is absolutely continuous, i.e. there exists a 
function u(.,t) defined on (0, 1) such that 



sW(s,t) = I u(£,t)d£ 
Jo 



(5) 



with, for all j = 1, ... ,k Uit , 



kn.t 



C/-l)/fcn,« 



< 9 



J 



k n ,t + 1 



(6) 



where g(.,t) is a positive continuous function defined on (0,1) and 
satisfying 

/ max(l, log(l/s))(/(s, t)ds < 
Jo 



CO. 



(7) 



(B.2) There exists a constant 5 > such that f} \ W(s, t)\ 2+s ds < oo. 



Assumptions on the sequences k n t and h n f We assume that h n + is 
an intermediate sequence, which is a classical assumption in extreme-value 
analysis: 

(C) ncp(h n)t )/k n) t -> oo and k n> t ->■ oo. 

Remark that (C) implies rup(h n t) — >• oo i.e. the number of points in [8, oo) x 
B(t,h n j) goes to infinity as the total number of points does. 
In order to simplify the notations, let 

, def ( nip(h nyt ) 
V k n j 
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and introduce the rescaled log-spacings 

Ci, n (t) = I log I -— -— , 1 = 1,..., k n>t , 

\ Zl rn n ,t-i,m n} t\ t ) J 

such that estimator ([2]) can be rewritten as 

%(t, W) = ^2 Ci, n (t)W (i/k n , t , t) Y, W *) • 

i=l I i=l 

Besides, in the following, each vector {vi >n , i = 1, . . . , k n> t} is denoted by 
{vi t n}i- Our first main result establishes the exponential regression model 
iox {Ci, n (t)}i. 



Theorem 1 Suppose (A.l), (A. 2), (A. 3), (B.l) and (C) hold. Then, 
the random vector {Ci >n (t)}i has the same distribution as 



. kn,t + 1 
uniformly in i = 1 , . . . , k n j with 



Fi + Pi, n (t) + Op (b n>t ) 



;i + o P (0) 



' «=1 



and where F±, . . . ,F^ nt are independent standard exponential variables. 



Similar results can be found in [15] for rescaled log-spacings of Weibull-type 
random variables, and in [4] in the case of Pareto-type random variables 
without covariate. We also refer to |16| for approximations of the Hill process 
by sums of standard exponential random variables. In the conditional case, 
i.e. when covariate information is available, only few results exist. We 
refer to [TB], Theorem 3.5.2, for the approximation of the nearest neighbors 
distribution using the Hellinger distance and to [20] for the study of their 
asymptotic distribution. Our second main result establishes the asymptotic 
normality of our estimators. 

Theorem 2 Suppose (A.l), (A. 2), (A. 3), (B.l), (B.2) and (C) hold. 
If, moreover, 

h}/ t X,t -> A(*) 6 M and k]$h% t -+ (8) 

then 

k l J 2 t (%(t, W) - 7 (i) - b nit AB(t, W)) 4 M (0, ^ 2 (t)AV(t, Wj) , (9) 



6 



where we have defined 



AB(t,W)= / W(s,t)s' p{t) dsandAV(t,W) = / W 2 (s,t)ds. 
Jo Jo 

It appears that the asymptotic bias involves two parts. The first one is 
given by b n t and thus depends on the original distribution itself. The sec- 
ond one is given by AB(t, W). This multiplicative factor can be made small 
by an appropriate choice of the weighting function W, see the next section. 
Similarly, the variance term is inversely proportional to k n j, the number of 
observations used to build the estimator, and the multiplicative coefficient 
r y 2 (t)AV(t,W) can also be adjusted. When X(t) ^ 0, the first part of con- 
dition (|8|) forces the bias to be of the same order as the standard-deviation. 

1 /2 

The second part k n t h" u t — > is due to the functional nature of the tail in- 
dex to estimate. It imposes to the fluctuations of t — >■ U(.,t) to be negligible 
compared to the standard deviation of the estimate. 

The following result establishes that the estimators of the extended fam- 
ily inherits from the asymptotic distribution of estimators in family ([2]). 



Corollary 1 Under the assumptions of Theorem^ 

(7„(t, // w ) - 7 (t) - b n>t AB(t, W)) A jV (0, ~/ 2 (t)AV(t, W)) . (10) 

We now propose a precise evaluation of the rate of convergence obtained in 
Theorem [2] in the particular framework of multidimensional nonparametric 
regression. 

Corollary 2 Let E = W and suppose (B.l), (B.2) hold. If, moreover, 7 is 
a-Lipschitzian, the slowly-varying function L in f7]) is such that L(y, x) = 1 
for all (y, x) G R + x K p and 

liminf^(Vt)/<t >0, (11) 

a 

then the convergence in distribution fPJ) holds with rate np+' 2a r] n , where r\ n — )■ 
arbitrarily slowly. 

Condition (jXTJ) is an assumption on the multidimensional design and on 
the distance d. Lemma [3] in Section [6] provides an example of design ful- 
filling this assumption. Under the condition L(y, x) = 1 for all (y, x) G 
IR + x R p , estimating 7(3;) is a nonparametric regression problem since 7(2;) = 
E(logY|X = x). Let us highlight that the convergence rate provided by 
Corollary [2] is, up to the r\ n factor, the optimal convergence rate for estimat- 
ing a-Lipschitzian regression function in W, see [30J. 
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4 Discussion on the choice of the weights 



In order to illustrate the usefulness of our results, we first provide two exam- 
ples of weights extending classical extreme index estimators to the presence 
of covariates. Second, we propose some "optimal" choices of weights in 
the theoretical situation where the second order parameter p(t) is known. 
Finally, we give some ideas to overcome this restrictive assumption. 



4.1 Two classical examples of weights 

We first introduce an adaptation of Hill estimator to take into account 
the covariate information. Considering in ([2]) the constant weight function 
W H (s,t) = 1 for all s G [0, 1] yields 

ln (t,W H ) = — 2^ log — ^ '-—) (12) 

Kn >t i= i V Zj rn n ^t-i,m rl! t\ l ) J 

which is formally the same expression as in [26] . Clearly, W H satisfies the as- 
sumptions (B.l) and (B.2) and then the asymptotic normality of 7 n (i, W H ) 
is a direct consequence of Theorem [2j 

Corollary 3 Under (A.l), (A. 2), (A. 3), (C) and the convergence 
in distribution {2|) holds for j n (t, W u ) with AB(t,W u ) = 1/(1 - p(t)) and 
AV{t,W R ) = 1. 

Similarly, we define a Zipf estimator (proposed simultaneously by Kratz and 
Resnick |27| and Schultze and Steinebach |28j ) adapted to our framework. 
Remarking that the pairs 

Ti, n {t) d = E -^°S( Z m n ,t-i+l,m n , t {t)) ) , % = 1, • • • , m n , t , 

are approximatively distributed on a line of slope j(t) at least for small 
values of i and for h nt t close to zero, one can propose a least-square estimator 
based on the k n j largest observations : 

%{t,p z ) =^(r i)ri (t)-f n (i))log(Z mnjt _ i+ i imn>t (i)) / ^(Ti tn (t)-f n (t))Ti >n (t), 
i=i / i=i 

(13) 

where f n {t) = ^- r i>n {t). Since C3J can be rewritten as 



Ut, = E ^log f ^-^^f ) Mf,n (*) / E /*„ (*) , 
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with 

1 * 

»ln(t) = ~ I>i,»(*) - f„(t)) = - log (i/M (1 + o(l)), 
i=i 

uniformly in i = l,...,k n j (see Section [6] for a proof), it appears that 
this estimator belongs to the extended family ([3]) associated to the weight 
function W z (s,t) = — log(s). Lemma [2] in Section O shows that condition 
(B.l) is fulfilled with g(s,t) = 1 — log(s) and thus Corollary [1] yields 

Corollary 4 Under (A.l), (A. 2), (A. 3), (C) and the convergence 
in distribution 07))) holds for %(t, /i z ) with AB(t, W z ) = 1/(1 - p{t)) 2 and 
AV(t,W z ) = 2. 

4.2 Theoretical choices of weights 

In this subsection, three problems are addressed: The definition of asymptot- 
ically unbiased estimators, of minimum variance estimators and of minimum 
variance asymptotically unbiased estimators. 



Asymptotically unbiased estimators. We propose to combine two weights 
functions in order to cancel the asymptotic bias. More precisely, we use the 
following result, which proof is straightforward. 

Proposition 1 Given two weights functions W\(.,t) andW 2 (.,t) satisfying 
(B.l) and (B.2) and a function a(t) defined on E, the weight function 
a(t)Wi(.,i) + (1 - a(t))W 2 (.,t) also satisfies (B.l) and (B.2). 

Hence, Theorem [2] entails that the asymptotic bias of the obtained estimator 
is given by 

b n , t {a(t)AB{t, Wi) + (1 - a(t))AB(t, W 2 )) . 
Clearly, if Wt(.,t) ^ W 2 (•,*), choosing 

AB(t,W 2 ) 

{) AB(t,W 2 ) - ABit^Y { ' 

permits to cancel the asymptotic bias. As an example, one can combine 
the weights of the conditional Hill and Zipf estimators defined respectively 
by (|12p and ()13|) to obtain an asymptotically unbiased estimator 7 ra (i, 1^ HZ ) 
with 

The following result is a direct consequence of the above results. 

Corollary 5 Under (A.l), (A. 2), (A. 3), (C) and |2JJ ; the convergence in 
distribution (0) holds for %(t,W HZ ) with AB(t,W HZ ) = andAV(t,W HZ ) = 
l + (l-lMi)) 2 . 
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Minimum variance estimator. It is also of interest to find the weights 
minimizing the variance. The following result is the key tool to answer this 
question. 

Proposition 2 Let t £ E. The unique continuous function W(.,t) such 
that Jq W(s,t)ds = 1 and minimizing W 2 (s,t)ds is given by W(s,t) = 1 
for all s G [0, 1]. 

It thus appears that the conditional Hill estimator (|12p is the unique mini- 
mum variance estimator in ([2]). 

Asymptotically unbiased estimator with minimum variance. Fi- 
nally, we provide the asymptotically unbiased estimator with minimum vari- 
ance. 

Proposition 3 Let t £ E. The unique continuous function W(.,t) such 
that Jq W(s,t)ds = 1, Jq 1 W(s, t)s~ p( -^ds = and minimizing L W 2 (s,t)ds 
is given by 

W*(8,t) = ( P (i) - 1 + (1 - 2p(t)) S -^) . 

Remark that W opt (s,t) = a(t)W 1 {s,t) + (l-a(t))W 2 {s,t) with Wi(s,t) = 1 
for all s e (0,1), W 2 (s,t) = (1 - p(t))s~P^ and a{t) = (1 - p(t)) 2 /p 2 {t) 
defined as in (fTi|) . From Lemma [21 W\(.,t) and W2(-,t) both satisfy as- 
sumptions (B.l) and (B.2) with gi{s,t) = 1 and g 2 (s,t) = (1- p(t)) 2 s~ p ^ . 
Thus, Proposition [1] and Theorem [2] yield the following corollary: 

Corollary 6 Under (A.l), (A.2) ; (A. 3), (C) and the convergence in 
distribution (G|) holds for %(t, W opt ) with AB(t,W opt ) = and AV(t,W opt ) = 
(l-l/p(t)) 2 . 

Unsurprisingly, the estimators J n (t, W HZ ) and ■y n (t, W° pt ) requires the knowl- 
edge of the second order parameter p{t). The estimation of the function 
t — > p(t) is beyond the scope of this paper, we refer to [U [2j [23l [8] for 
estimators of the second order parameter when there is no covariate infor- 
mation. The definition of estimators of the second order parameter with 
covariates is part of our future work as well as the study of the asymptotic 
properties of the ^(t) estimator obtained by plugging the estimation of p(t). 
Here, we limit ourselves to illustrating in the next subsection the effect of 
using a arbitrary chosen value. 

4.3 Practical choice of weights 

In this subsection, we study the behavior of the estimators 7 n (t,W^ HZ ) and 
7„(i, VF° pt ) in which we replace the second order parameter p(t) by a arbi- 
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trary value p* < 0. We then define 7 n (i, W™) and J n (t, W°T) with respec- 
tive weights 

W° p ?(s,t) = (p* - 1 + (1 - 2p*)B-") . 

Their asymptotic normality is a direct consequence of Theorem [2j 

Corollary 7 Under (A.l), (A. 2), (A. 3), (C) and |3Jj ; the convergence in 
distribution (0j holds for j n (t, W™) and 7n(^ W°T) with 

AB{t,w™) = ^;:;jiL , ^v(t,^ z ) = i + (i - i/p*(t)) 2 , 

^(^;* pt ) = rtm$-™M) > AV(t,w;?) = (i - i/ P *(t)) 2 . 

The proof is a direct consequence of Theoreme [2j It appears that a bias 
is introduced in the asymptotic distribution. Let us also note that the 
asymptotic bias of the estimators 7n(tjW5 z ) and y n (t, W°»') are of same 
sign. In term of variance, such a misspecification can allow an improve- 
ment since p* < p(t) yields AV(t, W°f) < ^V(t, W opt ) and „4V(i, W^ 2 ) < 
„4V(i, II^ 112 ), see Figure [TJ The densities of the asymptotic distributions 
of A fn(t,Wp* z ) are represented for different choices of p* in case of a Burr 
distribution with extreme-value index j(t) = 0.3 and second order param- 
eter pit) = —1. Here, m n ^ = 5000 and k n j = 500 leading to b n j ~ 
—0.08. Clearly, choosing a small value of p* is better than choosing a 
large one. In fact, it is easily seen that AV(t,W^, z ) — >■ AV(t,W z ) and 
AB(t,W^ z ) -»■ ^(t,iy z ) as p* -> -oo, whereas .4V(t,W*J z ) -> +oo and 
^B(t,W^ z ) -> +oo as p* — > 0. Similar conclusions hold for 7 n (t, W opt ). 
The consequences of the misspecification of the second order parameter on 
the relative efficiency are studied in [9] in the unconditional case. 
From the practical point of view, the four estimator j n (t, W u ), 7 n (i,p z ), 
j n (t, W^ z ) and J n (t, W opt ) are easily implement able. The remainder of this 
paragraph is devoted to their comparison. Simple calculations lead to the 
following partition of the (p, p* ) plane into 5 areas (see Figure [5]) defined as 
A={p(t) <0,p* < 0|p(t)/(2 - p{t)) < p*}, where 

AB{t,W z ) < AB(t,W H ) < \AB{t,Wf z )\ < \AB(t,W^)\, 
B={p(t) < 0,p* < 0|(1 - y/l-2p(t))/2 <p*< pit)/ [2 - pit))}, where 

AB(t,W z ) < \AB{t, W^ z )\ < AB(t,W H ) < \AB{t,w;?)\, 
C={p(t) < 0,p* < 0\p(t)/2 < p* < (1- y/1 - 2p(t))/2}, where 

AB(t,W z ) < \AB(t, W™)\ < \AB{t,W° p f)\ < AB(t,W H ), 
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B={p(t) <0,p* < 0|pi(t) < p* < p(t)/2 and p* < p 2 (t)}, where 
\AB(t, W^)\ < AB{t,W z ) < \AB{t,W° p ?)\ < AB(t,W H ) 

E={p(t) <0,p* < 0\p 2 (t) <p*< Pl (t)}, where 

\AB(t, W^)\ < \AB{t,W;T)\ < AB(t,W z ) < AB(t,W H ) 

and with the frontier functions 



M p(t) -1 - V(l -p(tW + 4(1 -p(t)) 
Pl{t) = o 



(2 + p(t))(p(t) - 1) + V(2 + gggg - ggg - Mt)( P (t) ~ l)(p(t) ~ 2) 

Next, concerning the corresponding asymptotic variances, we have: 
In the half-plane N (p* > -1 - y/2), 

AV(t, W H ) < AV(t, W z ) < AV(t, W°?)\ < AV{t, W** z ) 
In the half-plane S (p* < -1 - y/2), 

Av(t, w n ) < AV(t, w;f) < AV(t, W z )\ < AV(t, W^) 

These inequalities are summarized in Figure For practical reasons, we 
limit p(t) in [—10,0] and p* in [—4,0]. The dashed line represents the case 
p* = p(t). 

5 Illustration on real data 

In this section, we propose to illustrate our approach on the daily mean 
discharges (in cubic meters per second) of the Chelmer river collected by 
the Springfield gauging station, from 1969 to 2005. These data are provided 
by the Centre for Ecology and Hydrology (United Kingdom) and are avail- 
able at |http: //www, ceh. ac .uk7 data/nrf a, In this context, the variable 
of interest Y is the daily flow of the river and the bi-dimensional covariate 
x = (xi,X2) is built as follows: x\ G {1969,1970,... ,2005} is the year of 
measurement and x 2 £ {1,2,..., 365} is the day. The size of the dataset is 
n = 13,505. 

The smoothing parameter h n t as well as the number of upper order 
statistics k n> t are assumed to be independent of t, they are thus denoted 
by h n and k n respectively. They are selected by minimizing the following 
distance between conditional Hill and Zipf estimators: 

min max|7„(i,H^ H ) - %(t,p z ) \ , 
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where T = {1969, 1970, . . . , 2005} x {15, 45, ... , 345}. This heuristics is 
commonly used in functional estimation and relies on the idea that, for a 
properly chosen pair (h n , k n ) both estimates 7 n (i, W H ) and j n (t, A* 2 ) should 
yield approximately the same value. The selected value of h n corresponds to 
a smoothing over 4 years on x\ and 2 months on x<i- Each ball B(t, h n ), t £ T 
contains m n = mp(h n ) = 1089 points and k n = 54 rescaled log-spacings 
are used. This choice of k n can be validated by computing on each ball 
B{t, h n ), t G T the \ 2 distance to the standard exponential distribution. The 
histogram of these distances is superimposed in Figure to the theoretical 
density of the corresponding \ 2 distribution. For instance, at level 5%, the 
X 2 goodness of fit test rejects the exponential assumption in 5.7% of the 
balls. The resulting conditional Zipf estimator is presented on Figure HI 
The obtained values are located in the interval [0.2,0.7]. It appears that 
the estimated tail index is almost independent of the year but strongly 
dependent of the day. The heaviest tails are obtained in September, which 
means that, during this month extreme flows are more likely than during 
the rest of year. 



6 Proofs 



For the sake of simplicity, in the sequel, we note kt for k n j, bt for b n j, m t 
for m n j and ht for h n> t- 



6.1 Preliminary results 

This first lemma provides sufficient conditions on 7 and £ to obtain (A.2). 

Lemma 1 Assume that the first-order condition Op holds. If, moreover, 
there exists positive constants zg, eg, c-y, a 7 < 1 and an < 1 such that for all 
xeB(t,l), 

|7(*)-7(*)| <c 7 (T-(x,i), 



and 



sup 

z>zt 



£(z,x) 



£(z,t) 



<c e d a t{x,t), 



then (A.2) is verified with ajj = min(a^,a 7 ). 
Proof — Under fl3J), we have 

logU(z,x) _ (7(j ~ 7(*)) Mj + fog (Ht) 
l0gt/(M) log(,) 7 (t) (l + 

Using the well-known property of slowly varying functions log £(z, x) /log(z) — 
as z — > 00, and taking into account that j(t) > 0, it follows that, for z 
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large enough, there exists a constant cL > such that 



log U (z, x) 



logU(z,t) 



< 



"7 ja 



7(i) 



log 



< _I^( M ) + 2 
7W 



*(z,t) 



since |u| > 1/2 entails | logu| < 2|u — 1|. Thus, 



log U(z, x) 



logU(z,t) 
and the conclusion follows. 



< ^d a ^(x,t) + 2c e d ae {x,t), 



The next lemma provides sufficient conditions on the weights to verify con- 
dition (B.l). 

Lemma 2 Let W(.,t) be a differentiable function on (0, 1). If sW(s,t) —> 
as s — > then {5^ holds with u(s,t) = dsW(s,t)/ds. Furthermore, if there 
exists a positive and monotone function <p(.,t) defined on (0,1) such that 
max(|«(s, t)\, \W(s, t)\) < 4>(s,t), </>(l,i) < oo and </>(.,£) is integrable at the 
origin then {6}) and §7$ are satisfied. 

Proof — Clearly, since W(.,t) is a differentiable function with sW(s, t) — > 
as s — > 0, the function sW(s,t) is absolutely continuous with u(s,t) = 
dsW(s, t)/ds. Furthermore, for all j = 2, . . . , kt, 



j/kt 



'(j-l)/kt 

Since (ft(.,t) is monotone on (0, 1), we have: 



< sup (f)(s,t). 
se[(j-i)/kt,j/k t ] 



kt 



j/kt 



< 



^kT,*) <4> (|]dpi,*) if <P(-,t) is decreasing, 



U-i)/k t 
For j = 1, we have 



l/fct 

h I u{£,t)d£ 



i ) if d)(., t) is increasing 



IF 



where 



4>(s/2,t) if is decreasing, 

4>(2s,t) if <^>(.,i) is increasing. 

As a conclusion, condition (0) is verified. From Cauchy-Schwartz inequality, 
to prove ([7]), it only remains to verify that g(s,t)ds < +oo. This is a 
consequence of the integrability of 4>(-,t) at the origin. ■ 
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We now provide and example of a multidimensional design points and a 
distance d satisfying condition (jlip . In simple words, Lemma[3]states that, if 
the n covariates are distributed on a "rectangular" grid in W, the proportion 
of points in B(t,h n t) is asymptotically proportional to the volume of this 
ball. See }19j. Lemma 13.13 for a similar result in the random design setting. 

Lemma 3 Let E = W, d(x,t) = \\x — i||oo and let G be a p- dimensional 
cumulative distribution function associated to a density function g such that 
g(t) ^ for all t in a bounded set. Assume that G admits independent 
margins G\, . . . , G p , n l / p G N, and define the lattice £ = {1,2,..., n l / p } p C 
W. We define the multidimensional design by {xp, j3 G C} where f3 = 
(/3i, . . . , f3 p ) G N p is a multi-index and such that each coordinate of xp is 
given by 

[xp)j - x Pj - J - 1, ... , P . 

Suppose nh P — > oo, then tp(ht) = (2h t ) p g(t)(l + o(l)). 
Proof — Using the above definitions, we have 

n ^-^ 

pec 

^ n}/P n 1 ^ p 

p nVp 
3=1 &=1 

1 p nVP ( R - 1 1 

= ~ n II E'jGife - W £ Jftt £ G *fe + w} 

- p-»-"Tfl^nE<^). <*> 

3=1 Pj=l V 7 

where we have introduced the indicator function 

Qj{u) =I{G j {t j -h t )<u< Gj{tj + ht)} 

for u G [0, 1]. The above Riemann's sums can be approximated as 

sbEftiM) = +°("- 1/p > 

Pj— i 

= Gy(ij + h t ) - Gjitj - h t ) + Oin- 1 ^) 

= 2h t g J (t j ) + o{h t ) + Oin- 1 ^) 

= 2h t g J {t j ){l + o{h t )), 
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since we assumed that g(t) 7^ and nh^ — > 0. Replacing in f)15[) . the result 
follows. ■ 



6.2 Proofs of main results 

Proof of Theorem [T]- Under (A.l) we have {Z mt _ i+ i >mt (i)}i = {U(V~^ t , Xi)}i 
where Vi jmt < . . . < V mumt are the order statistics associated to the sample 
V\, ... , Vm t of independent uniform variables. It follows that: 

{io g (z mt _ m , mt = (iog(j7(iei,t)) ( 1 + ^S'^ - ^ } 



def 



{log([/(^ t ,t))(l + £nii )}.. 

Now, assumption (C) entails that for all i = 1, . . . , kt, 

V^l > Vj-^ = (m t /k t )(l + 0P (1)) 00, 

which implies that, for n large enough, > zjj for all i = 1, ...,kt- 

Consequently, (A. 2) implies that 

max |e n i| < cuh^ u , 
i=l,..., fct 

we thus have {log(Z mt _ i+1 , mt (t))} i {log(^(^, t))(l + P (/if))}*. The 
end of the proof is then a direct consequence of the following result (see [1] , 
Theorem 2.1 and 2.2 for a proof): 

{*■ (5B>) H (* + * (stt) ""') f - + & " (t) + 

where {-Fiji l°g(^7ra t /^+i m )}* are independent standard exponential 
variables and with (under (B.l)) 



Proof of Theorem [2] — From Theorem [IJ we have 

1=1 j 1=1 

+ {l + 0^))b t Y i W{i/h,t)[ 1 -— 

1=1 t I 



16 



kt 

+ (i + o P (h? u ))J2w(i/h,t)p iin (t) 

1=1 

kt 

+ op(b t )J2\W(i/h,t)\. 

i=i 

Introducing 

kt k t / ■ \ — 



Ti,„, = Y,W(i/k t ,t)(F i - 1), T 2 ,„ = J2W(i/h,t) ( r^TT ) " 1} ' 

i=l i=l V * + / 

r 3i „ = 5^w r (i/fc t ,t)A,«(t), r 4 , n = b t J2w(i/k t ,t) 



i=l i=l 



fct + 1 



1/2 



fei fet / fcf \ 

T 5 , n = J2W(i/h,t), T 6 ,„ = £|TF(i/A*,t)|, T 7 , n = £V 2 (i/fc*,t) 

i=l i=l \i=l J 

we obtain the following expansion: 

J 7,n V i 5,n/ \ -t7,n J-7,n J 7,n/ 

+ f^+^) p(^r)+^p(^)-(i6) 

Let (5 be defined by (C.2). From Lindeberg theorem, a sufficient condition 
for T 1>n /T 7>n A W(0, 1) is that 

kt 

J2\W(i/h,t)\ 2+s /T% s ^0. (17) 

i=i 

Since, for any integrable function -0, the following convergence of Riemann 
sum holds, 

i X . ( o rl 



i=i 

it follows that T 7>n = kl /2 AV(t, W) l / 2 {1 + o(l)). Thus, using again (Jig 

fct 



£|W(</fc t ,t)| 5 ^/2^ = 0(*T 



-<5/2x 

|kk W n^, t/| /-t 7in — i^v^t 

i=l 

showing that condition (|17[) is satisfied and 



r llft /r 7lft 4jV(o,i). (19) 
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Next, we focus on the term T2 t n/T 7tn . Remarking that this term is centered, 
and that its variance is finite, we can conclude that 

T2, n /T 7 , n = Op(l). (20) 

Theorem [T] shows that 

T 3 ,n/T 7tn = Op(k l t /2 b t ) = 0P (1). (21) 

From repeated use of (|18p . it follows that 



T 4 ,n/T 5 , n = b t AB(t,W)(l + o(l)) (22) 

n, n /T 7 , n = 0(kl'\) = 0{l) (23) 

T B , n /T 7jn = A; t 1/2 ^V(t,W)- 1 / 2 (l + (l)) (24) 

T 6 , n /T 7 , n = 0{k]' 2 ). (25) 



Replacing l|20 j) - (|25|) in (JT6J) yields 

kl /2 AV(t, wy 1 ' 2 (%(t, ft) - 7 (t) - b t AB(t, W)) 
£ 7 (t)T 1)n /T 7 , n + 0(kl /2 ht u ) + o P (l), 

and (fTUj) gives the result. ■ 
Proof of Corollary [1] — The proof consists in remarking that 

T„(t,ri- 7 (t)-M6(t,lf) 
fct / fet 

= /*,„(*) (C iin (t) - 7 (t) - b t AB{t, W)) / ^ n (i) 
i=i / i=i 

= (1 + o(l)) ]T (i/fct, t) (C i)ft (i) - 7(0 - M#(i, WO) / Yl W (»/**> *) 

1=1 / 8=1 

= (1 + o(l)) ( 7 n(t, WO " 7(<) - &t^B(* 5 WO) . 
and the conclusion follows from Theorem ■ 

Proof of Corollary [2] — Assuming that L(y, x) = 1 for all (y, x) £ M + x M p 
implies l(y,x) = 1 in @ and thus (A. 3) holds with b(y,t) = 0. Further- 
more, (A.l) is straightforwardly true and since 7 is a-Lipschitzian, Lemma[T] 

1 2a 

entails that (A. 2) holds. Choosing h n ^ = n p+ 2a and k n ^ = nv+ 2ar q n , where 
T] n — > arbitrarily slowly, condition (C) is verified since nh p nt /k n j — > 00 
and (|11|) imply n(p(h n j)/k n j — > 00. As a conclusion, Theorem [2] provides 

a 

the asymptotic normality of the estimator with convergence rate np+ 2a r] n M 
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Proof of Corollary [4]— Let us first prove that (|13p belongs to the extended 
family (|3|). Remarking that 



».»(*) = £7 + E 7' 



7 — 7 



estimator can be rewritten as : 

h I fa fa 

£Kn(t)-T„(t))iog( (t)/Z mt _ fctimt (t)) / ^( ri>n (t)-r„(t))^- 

j=l / i=l j=i 

(26) 

Next, since 

fet 

log^mt-i+Lmt (t)/Zmt-kt,m t (*)) = E ^°&(Zrn t -j+l,rnt (£) I % m t - 3 ,m t 

inverting the sums in (|26p . it appears that (|13p belongs to family ([3]) with 

1 ' 

^h-E^w-^))- 

Second, we prove that, uniformly in % = 1, . . . , kt, 

M ? n (t) = -log(i/fc t )(l + o(l)). (27) 
For the sake of simplicity, we introduce the following notation : 

^ i ^ i rn t ^ 

i=i j'=i i=j 

so that fJ>i >n {t) = Si >mt — Sk tt mf Furthermore, for i = 2, . . . , k t , 
^ i-i mt 1 mt 1 

Si ' mi = yEEy + yEy 



i - 1 



5*4-1, m t + T ^ ] y = Si—l,mt r ( Si—l,mt ~ ^ ] y J j 

% i=i 1 V i=t / 



and remarking that 

mt 1 1 t— 1 i— 1 



5i_ ljm , - Ey = — rEEy = 1 ' 



Z=i jf=l Z=j 
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we obtain the following recursive relation: Si >mt = Si-\, mt — 1/i for i 
2, . . . , kt- We thus have a simplified expression of the weights: 



<n(*) 



kt 

E i « = - 1, 

i = h- 



We are now in position to evaluate the difference between fJ% n (t) and — \og(i/kt). 
For i = 1, . . . , fej — 1, 

- log (i/k) = log ( J] 737) = E lo s f 1 + 737) > 

V=i+1 / l=i+l ^ ' 



and consequently, 
- log (i/k t ) - (xfJt) = { 



( kt , , 

E (iog(i + A 





i = 1, ... ,kt - 1, 
i = k t . 



(28) 



Remarking that for I > 2 the following inequality holds, 



< log 1 + 



we deduce from (1281) that for i = 1 



1 \ _ 1 1_ 

i - 1 J I ~ I 2 

— h — 1, 



0<-log(z7A; t )-/i? n (t) < E 



Furthermore, since 



1 

]2 



< 



Z=i+1 

we have for i = 1 , . . . , kt — 1 , 



fct i 



Z=i+1 



1 1 



0<1-mL (t)/log(fct/i)< 



1 1 



log(i/fet) V* fc * 



Finally, since the sequence 
Hi) 



\og{i/k t ) \i k t 



--T ) » *e [l,fet[ 



is decreasing, we have for i = 1, 



,fe-l 



0< (t)/log(fct/i) < 



log(ftt) 



1 



proving that (j27[) is true. The end of the proof is a consequence of Corollary[T] 
and Theorem [2 ■ 
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Proof of Proposition [2] — For all W such that L W(s,t)ds = 1, we have 



and thus minimizing J Q W 2 (s,t)dt is equivalent to minimizing J (W(s,t) — 
l) 2 ds. Consequently, the solution of the constrained optimization problem 
is W(.,t) = 1 almost everywhere on [0, 1]. Since W is assumed to be con- 



Proof of Proposition [3]— First, we easily check that the function W° pt (.,t) 
is continuous, W° pt (s, t)ds = 1 and f£ W opt (s, t)s~ p ^ds = 0. Next, 
remarking that for all continuous function W(., t) satisfying W(s, t)ds = 1 
and Jlw{s,t)s~P^ds = 0, we have 



it appears that minimizing J* Q W 2 (s, t)ds is equivalent to minimizing f (W(s, t) — 
W° pt (s, t)) 2 ds. Since W(.,t) is continuous, the conclusion of the proof is 
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Figure 1: Densities of the asymptotic distributions of 7 n (t, W^S ). Solid 
curve p* = 1, dotted curve p* = —0.2, dashed curve p* = —5, solid vertical 
line: true value 7, dotted vertical line: 7 + b n ^AB(t, W z ), i.e., the mean of 
the asymptotic distribution when p* — > —00. 
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rho(1) 

Figure 2: Comparison of the asymptotic bias and variances 
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5 10 15 20 25 

Figure 3: Histogram of the \ 2 distances between the rescaled log-spacings 
and the standard exponential distribution. The theoretical density of the 
corresponding x 2 distribution is superimposed. 
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Figure 4: conditional Zipf estimator 7 n (i,/x z ) of the tail index computed on 
the real dataset. Two covariates are available: The year ranging from 1969 
to 2005 and the day ranging from 1 to 365. For the sake of readability, only 
the first letter of the corresponding month is represented. 
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